Skip to content

Feat/ecsm accelerator#657

Open
jotabulacios wants to merge 28 commits into
mainfrom
feat/ecsm-accelerator
Open

Feat/ecsm accelerator#657
jotabulacios wants to merge 28 commits into
mainfrom
feat/ecsm-accelerator

Conversation

@jotabulacios

@jotabulacios jotabulacios commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

This PR adds the ECSM precompile: secp256k1 scalar multiplication xR =(k·G).x. The multiplication is implemented in the executor (ecsm::scalar_mul_x from the new shared crypto/ecsm crate, verified against known test vectors) and proven by three chips:

  1. ECSM core (427 columns): ECALL receiver, one row per call, issues the MEMW reads of xG/k and the write of xR (timestamp-offset so addr_xR may alias addr_xG), witnesses yG and proves yG² ≡ xG³ + b
    through byte-limb convolutions (quotients q0/q1 + 64-entry carry arrays, range-checked via IS_BYTE/IS_HALF), checks 0 < k < N and xR < p with borrow chains, and starts the loop on the new SERVE_K / BIT / ECDAS buses.
  2. ECDAS step chip (521 columns, one row per double/add step): runs the double-and-add round loop as a self-referential ECDAS bus. Each row proves one curve double or add through the λ / xR / yR convolution relations (33-byte quotients with offset r = 3p, 64 carries each) and consumes scalar bits from the BIT bus.
  3. EC_SCALAR table (15 columns, 32 rows per call): walks the 32 bytes of k through a self-referential SERVE_K chain (one MEMW byte read per row) and serves the set bits on the BIT bus.

Comment thread executor/src/vm/instruction/execution.rs
Comment thread crypto/ecsm/src/witness.rs Outdated
Comment thread executor/programs/asm/test_ecsm.s Outdated
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmark Results for modified programs 🚀

Command Mean [ms] Min [ms] Max [ms] Relative
head hashmap 138.6 ± 3.3 134.9 144.1 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
head keccak 132.7 ± 6.4 126.7 147.1 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
head syscall_commit 91.2 ± 1.3 90.3 94.9 1.00

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review: ECSM Accelerator

This PR adds a secp256k1 scalar-multiplication precompile (ECALL -3 / ECSM_SYSCALL_NUMBER) with a new ecsm crate for reference arithmetic + witness generation, executor integration, and three prover AIR chips (ECSM, ECDAS, EC_SCALAR). The code is well-structured and extensively tested.

Issues found

Medium – Off-by-7 in address-overflow check (inline comment on execution.rs:408-409)
ecsm_addr_ok(addr_xg, 24) and ecsm_addr_ok(addr_xr, 24) should use 31, not 24. load/store_u256_le accesses bytes up to addr + 31 (four 8-byte doublewords at offsets 0, 8, 16, 24). An address like 0xFFFF_FFE8 passes the 24-offset check, but the access at addr + 31 = 0x1_0000_0007 wraps the lower limb. addr_k already correctly uses 31.

Medium – assert! panics in the executor/prover hot path (inline comment on witness.rs:110-116)
limb_carries, shifted_quotient, and to_le_33 use assert! to enforce arithmetic invariants. If any fires at runtime (e.g., due to an implementation bug), it panics the whole prover process — a DoS. This code is reachable once per ECALL(-3). Consider propagating Result, or at minimum debug_assert! so release builds survive.

Low – Wrong expected value in assembly test comment (inline comment on test_ecsm.s:32)
The comment says the result should equal x(2G), but the scalar is k = 5, so it should say x(5G).

No issues found in

  • Curve and field arithmetic (curve.rs, field.rs): correct affine double/add, Fermat inverse, sqrt via a^{(p+1)/4}, canonical even-root selection.
  • Scalar range validation (prepare): correctly rejects k = 0 and k >= N.
  • ECDAS constraint polynomials: λ / xR / yR convolution carry recurrences match the spec; padding row satisfaction is tested.
  • ECSM constraint polynomials: µ-gating of the yG relation is correct and the spec deviation is documented with a dedicated test (spec2_unconditional_yg_constraint_unsatisfiable_on_padding).
  • SyscallNumbers::Ecsm = 94 placeholder discriminant is consistent with KeccakPermute = 0 and works correctly via TryFrom.

@github-actions

Copy link
Copy Markdown

Codex Code Review

Found two issues in the PR diff:

  1. Medium - bug: ECSM proofs fail for valid aliased xG/k inputs
    In trace_builder.rs, xG and k are both modeled as 32-byte reads at the same timestamp T, and each read updates memory_state immediately at line 685. If addr_xg and addr_k overlap, especially if they are equal, the second read sees old_ts == T, but MEMW requires old_ts < T. The executor permits this because it just loads xG then k with no overlap restriction at execution.rs. Either reject overlapping operand ranges in the executor/syscall contract, or assign distinct proof timestamps for these reads.

  2. Medium - crypto/input validation: non-canonical xG >= p is accepted
    prepare parses xG but never checks xg < p before calling recover_y_canonical at lib.rs. recover_y_canonical immediately reduces through Fp::new at curve.rs, so values like p + 1 are accepted as aliases for 1 when x = 1 is on-curve. For k = 1, scalar_mul_x returns the original non-canonical xG as xR, violating the expected field-coordinate range and conflicting with the prover’s xR < p constraint. Add an xg >= p rejection path.

I could not run the Rust tests because rustup attempted to write under read-only /home/runner/.rustup.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Benchmark Results for unmodified programs 🚀

Command Mean [ms] Min [ms] Max [ms] Relative
base binary_search 65.0 ± 0.7 64.1 65.9 1.00 ± 0.02
head binary_search 64.8 ± 0.7 63.7 65.8 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base bitwise_ops 65.9 ± 0.7 64.7 66.6 1.01 ± 0.02
head bitwise_ops 65.4 ± 0.8 64.7 66.5 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base fibonacci_26 69.0 ± 0.8 68.1 70.3 1.00
head fibonacci_26 73.3 ± 4.1 69.3 81.0 1.06 ± 0.06
Command Mean [ms] Min [ms] Max [ms] Relative
base matrix_multiply 68.0 ± 0.6 66.8 68.6 1.00 ± 0.01
head matrix_multiply 67.7 ± 0.8 66.8 68.7 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base modular_exp 64.8 ± 0.7 64.1 65.8 1.00
head modular_exp 66.0 ± 0.6 64.9 66.9 1.02 ± 0.01
Command Mean [ms] Min [ms] Max [ms] Relative
base quicksort 69.3 ± 0.5 67.8 69.8 1.00
head quicksort 69.5 ± 0.3 69.1 69.8 1.00 ± 0.01
Command Mean [ms] Min [ms] Max [ms] Relative
base sieve 70.1 ± 0.9 69.1 71.3 1.00
head sieve 70.7 ± 0.7 69.1 71.3 1.01 ± 0.02
Command Mean [ms] Min [ms] Max [ms] Relative
base sum_array 80.6 ± 0.9 79.2 81.8 1.00
head sum_array 81.2 ± 1.5 79.0 85.0 1.01 ± 0.02

jotabulacios and others added 2 commits June 11, 2026 16:49
Replaces the per-operation Fermat inversions in the double-and-add replay
with audited k256 (RustCrypto) projective arithmetic + batched inversion.
The witness generator is untrusted (the ECDAS chip re-proves every step),
so audited host-side arithmetic is sound here.

- curve.rs: `replay_double_and_add` now replays the schedule in k256
  `ProjectivePoint` (no per-op inversion), `batch_normalize`s every point to
  affine in one shot, and batch-inverts the slope denominators — two batched
  inversions instead of ~2·len_k Fermat modpows. The slope `λ` is precomputed
  here (new `StepPts.lambda` field) so the witness builder never inverts.
- lib.rs: `scalar_mul_x` (executor) uses k256's optimized scalar mul directly,
  skipping the step list entirely.
- witness.rs: `build_step` consumes the precomputed `s.lambda`.
- The BigUint reference (`point_double`/`point_add`/`step_lambda`/
  `replay_double_and_add_reference`) is kept `#[cfg(test)]` only — production
  ships k256 alone — and a parity test pins k256 == reference byte-for-byte
  across small/structured/large/near-order scalars.

k256 is host-side only (witness gen), never in the constraint system, and was
already a transitive workspace dependency. Replay micro-bench: ~5.9x faster
than the BigUint reference on a 256-bit scalar.

Follow-up (separate stage): port the field/curve primitives we need to drop
the num-bigint reference path entirely.
@jotabulacios jotabulacios marked this pull request as ready for review June 11, 2026 20:42
@github-actions

Copy link
Copy Markdown

Codex Code Review

Findings

  • High - ECSM AIR does not enforce xG < p
    prover/src/tables/ecsm.rs range-checks x2, q0, yG, and q1, and prover/src/tables/ecsm.rs only adds overflow checks for k < N and xR < p. There is no equivalent xG < p check, even though the executor rejects non-canonical xG via prepare() at crypto/ecsm/src/lib.rs. For k > 1, xR < p does not imply canonical input: e.g. x = 1 is on secp256k1 and p + 1 still fits in 32 bytes, so a prover can use the aliased coordinate p + 1 and satisfy the modular curve/ECDAS equations, while the VM would have errored with CoordinateOutOfRange. Add an xG < p overflow/range check to the ECSM table, analogous to xR < p, and add a negative proof test for xG = p + 1, k = 2.

Verification

I attempted targeted tests, but the toolchain could not run in this sandbox:

error: could not create temp file /home/runner/.rustup/tmp/...: Read-only file system

Comment thread crypto/ecsm/src/lib.rs
bytes.resize(32, 0);
let mut out = [0u8; 32];
out.copy_from_slice(&bytes[..32]);
out

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — silent truncation for values > 2^256

bytes.resize(32, 0) followed by bytes[..32] silently drops the high bytes if v is larger than 32 bytes. All current callers pass values validated to be < p < 2^256, so nothing is dropped today. But this is a pub function and a future caller passing an un-reduced intermediate (e.g. a product before % p) would get silently wrong output. A debug_assert!(v.bits() <= 256) at entry would catch the misuse in tests at zero release cost.

Comment on lines +408 to +410
if !ecsm_addr_ok(addr_xg, 24)
|| !ecsm_addr_ok(addr_xr, 24)
|| !ecsm_addr_ok(addr_k, 31)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity — asymmetric max offsets (24 vs 31) need a comment

The different values are correct but look like a copy-paste bug at first glance:

  • addr_xg/addr_xr use 24: the MEMW bus addresses doublewords by their 8-byte-aligned base; the last doubleword's base is at addr + 8*3 = addr + 24, so only that offset needs to stay within the 32-bit low limb.
  • addr_k uses 31: EC_SCALAR issues individual byte reads at addr_k + offset for each byte offset ∈ 0..31, so the last byte's address addr_k + 31 must fit.

A short inline comment on each call would remove the ambiguity.

Comment thread crypto/ecsm/src/field.rs
/// Multiplicative inverse via Fermat's little theorem (`p` is prime): `self^(p-2)`.
/// Returns zero for a zero input (which never occurs for valid curve arithmetic).
pub fn inv(&self) -> Fp {
Fp(self.0.modpow(&(p() - BigUint::from(2u32)), &p()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — not constant-time

modpow is not constant-time, so the execution time of inv (and by extension, every λ computation in build_step) varies with the field values. Combined with the scalar-bit branching in replay_double_and_add, the entire ECSM computation leaks k via timing.

This is acceptable for a ZK prover (the prover is trusted with k; the verifier never observes prover wall-time), but a doc comment noting the non-CT behaviour would warn off any future caller that uses this for non-proving purposes (e.g., ECDSA nonce generation).

Comment thread crypto/ecsm/src/lib.rs Outdated
@jotabulacios jotabulacios force-pushed the feat/ecsm-accelerator branch from b2def07 to 29e2bb5 Compare June 12, 2026 20:02

@MauroToscano MauroToscano left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for tuesday for possible changes in the spec, but this lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants